Skip to content

Conversation

ntjohnson1
Copy link
Collaborator

@ntjohnson1 ntjohnson1 commented Jun 28, 2025

It would be nice to land #432 first since that resolves all of our doc warnings. I didn't fully merge it here so if that merges first I probably need to review this to make sure I didn't break any new stuff. Alternatively if we want to land this first I can revert the few unrelated commits (they made local iteration cleaner to avoid whatever mac precision thing I'm hitting).

This resolves #312. I added tests but hard to say if things are totally correct. For the sparse generation example in the tutorial I never fully groked that before and didn't dig into it now but the results all seem to look reasonable. There will probably be follow on after this to actual move usages over to this implementation, related to #363. This ended up being a large task than expected so punting on that for now.

Style things:

  • Maybe we need a style doc
  • Matlab has these big arbitrary dictionaries to capture outputs then liberal use of varargin. I haven't been super consistent on this but dataclasses seem nice for some of this and that is the direction I went here
    • The CPProblem vs ExistingCPSolution is a little clunky, but pretty clear/explicit. If I added a solution field to the CPProblem then we'd have a bit of a mess with default values and clarity for when the user want to generate things or not. Open to opinions about the usability
  • If our inputs are clear and unambiguous I don't think returning the input parameters in the output makes sense.

📚 Documentation preview 📚: https://pyttb--442.org.readthedocs.build/en/442/

@ntjohnson1 ntjohnson1 requested a review from dmdunla June 28, 2025 16:31
@dmdunla
Copy link
Collaborator

dmdunla commented Aug 28, 2025

@ntjohnson1 Thanks, what a major undertaking!

This is a departure from the TTB for MATLAB implementation, in that 1) you need to choose a Problem class, add parameterization to that, and then instantiate a (data, solution) pair; and 2) missing data pararmeterization is handled outside of the Problem class in the function signature, leading to two inputs as opposed to one in TTB (e.g., passing params again to make an identical copy).

Mapping to the TTB implementation, I now see the challenges, as there are so many options and combinations of options. So, I am OK with this for now. I will explicitly reach out to users to request feedback once this lands in its initial form.

@ntjohnson1
Copy link
Collaborator Author

I will explicitly reach out to users to request feedback once this lands in its initial form.

Open to feedback or a design discussion if people have thoughts. I had a little bit of consideration when I put this together but biased towards trying to make it more explicit/clear on expectations which might hurt usability a little. I'm sure there's at least some improvments for the ergonomics.

@dmdunla
Copy link
Collaborator

dmdunla commented Aug 29, 2025

@ntjohnson1 Everything seems to align with TTB for MATLAB, except the Sparse Problem creation. The number of nonzeros seems to be significantly lower in pyttb. When I ask for 50% sparsity, TTB for MATLAB return num_nozeros near 0.5*size(S) [just a bit lower, but you can estimate this using sampling with replacement and it is predicted to be lower than 50%]. However, in pyttb, the number is often off by a factor of ~3 (or a bit more):

sz = [20 15 10];
nf = 4;
A = cell(3,1);
for n = 1:length(sz)
    A{n} = rand(sz(n), nf);
    for r = 1:nf
        p = randperm(sz(n));
        idx = p(1:round(.2*sz(n)));
        A{n}(idx,r) = 10 * A{n}(idx,r);
    end
end
S = ktensor(A);
S = normalize(S,'sort',1);

info = create_problem('Soln', S, 'Sparse_Generation', 0.5);
num_nonzeros = nnz(info.Data)
total_insertions = sum(info.Data.vals)

num_nonzeros =

   718


total_insertions =

        1500
shape = (20, 15, 10)
num_factors = 4
A = []
for n in range(len(shape)):
    A.append(np.random.rand(shape[n], num_factors))
    for r in range(num_factors):
        p = np.random.permutation(np.arange(shape[n]))
        idx = p[1 : round(0.2 * shape[n])]
        A[n][idx, r] *= 10
S = ttb.ktensor(A)
S.normalize(sort=True);

existing_params = ExistingCPSolution(S, noise=0.0, sparse_generation=0.5)
solution, data = create_problem(existing_params)
print(
    f"num_nozeros: {data.nnz}\n"
    f"total_insertions: {np.sum(data.vals)}\n"
)

num_nozeros: 258
total_insertions: 1500.0

I am willing to merge for now and then add an Issue to address this specific discrepancy. What do you think of that approach?

@ntjohnson1
Copy link
Collaborator Author

I am willing to merge for now and then add an Issue to address this specific discrepancy. What do you think of that approach?

Sounds reasonable to me or we could just disable sparse generation for now. That setting is the only thing that routes through generate_data_sparse which appears to be broken.

@dmdunla
Copy link
Collaborator

dmdunla commented Aug 29, 2025

Sounds reasonable to me or we could just disable sparse generation for now. That setting is the only thing that routes through generate_data_sparse which appears to be broken.

Thanks, I'll merge it and add a new Issue for sparse_generation.

@ntjohnson1
Copy link
Collaborator Author

If you have more bandwidth to look at this it looks like I normalized the factor matrices incorrectly so the probabilities summed to more than 1 which caused the issue. With your sample script this yields much more reasonable results. (~700-740). But I'm also fine to take this over to the followup

diff --git a/pyttb/create_problem.py b/pyttb/create_problem.py
index 36d1c97..f14d2a6 100644
--- a/pyttb/create_problem.py
+++ b/pyttb/create_problem.py
@@ -593,7 +593,7 @@ def generate_data_sparse(
 
     # Convert solution to probability tensor
     # NOTE: Make copy since normalize modifies in place
-    P = solution.copy().normalize(mode=0)
+    P = solution.copy().normalize(normtype=1)
     eta = np.sum(P.weights)
     P.weights /= eta

@dmdunla
Copy link
Collaborator

dmdunla commented Aug 29, 2025

If you have more bandwidth to look at this it looks like I normalized the factor matrices incorrectly so the probabilities summed to more than 1 which caused the issue. With your sample script this yields much more reasonable results. (~700-740). But I'm also fine to take this over to the followup

  • P = solution.copy().normalize(mode=0)
  • P = solution.copy().normalize(normtype=1)

Got it, this makes sense. Thanks for the catch! I'll make the change.

@dmdunla dmdunla merged commit 843cba0 into sandialabs:main Aug 29, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement create_problem from Matlab TTB
2 participants